{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Estimating the Risk Premia using Fama-MacBeth Regressions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example highlights how to implement a Fama-MacBeth 2-stage regression to estimate factor risk premia, make inference on the risk premia, and test whether a linear factor model can explain a cross-section of portfolio returns. This example closely follows [Cochrane::2001] (See also [JagannathanSkoulakisWang::2010]). As in the previous example, the first segment contains the imports. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import statsmodels.api as sm\n",
    "from numpy import (\n",
    "    array,\n",
    "    cov,\n",
    "    diag,\n",
    "    eye,\n",
    "    hstack,\n",
    "    kron,\n",
    "    mat,\n",
    "    mean,\n",
    "    multiply,\n",
    "    ones,\n",
    "    savez_compressed,\n",
    "    sqrt,\n",
    "    squeeze,\n",
    "    vstack,\n",
    "    zeros,\n",
    ")\n",
    "from numpy.linalg import inv\n",
    "from pandas import read_csv\n",
    "from scipy.stats import chi2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, the data are imported. I formatted the data downloaded from Ken French's website into an easy-to-import CSV which can be read by `pandas.read_csv`. The data is split using named columns for the small sets of variables and `ix` for the portfolios. The code uses pure NumPy arrays, and so `values` is used to retrieve the array from the DataFrame. The dimensions are determined using `shape`. Finally the risk free rate is forced to have 2 dimensions so that it will be broadcastable with the portfolio returns in the construction of the excess returns to the Size and Value-weighted portfolios. `asmatrix` is used to return matrix views of all of the arrays. This code is linear algebra-heavy and so matrices are easier to use than arrays."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = read_csv(\"FamaFrench.csv\")\n",
    "\n",
    "# Split using both named colums and ix for larger blocks\n",
    "dates = data[\"date\"].values\n",
    "factors = data[[\"VWMe\", \"SMB\", \"HML\"]].values\n",
    "riskfree = data[\"RF\"].values\n",
    "portfolios = data.iloc[:, 5:].values\n",
    "\n",
    "# Use mat for easier linear algebra\n",
    "factors = mat(factors)\n",
    "riskfree = mat(riskfree)\n",
    "portfolios = mat(portfolios)\n",
    "\n",
    "# Shape information\n",
    "t, k = factors.shape\n",
    "t, n = portfolios.shape\n",
    "# Reshape rf and compute excess returns\n",
    "riskfree.shape = t, 1\n",
    "excess_returns = portfolios - riskfree"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "The next block does 2 things:\n",
    "\n",
    "1. Compute the time-series $\\beta$s. This is done be regressing the full array of excess returns on the factors (augmented with a constant) using lstsq.\n",
    "2. Compute the risk premia using a cross-sectional regression of average excess returns on the estimates $\\beta$s. This is a standard regression where the step 1 $\\beta$ estimates are used as regressors, and the dependent variable is the average excess return."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Time series regressions\n",
    "x = sm.add_constant(factors)\n",
    "ts_res = sm.OLS(excess_returns, x).fit()\n",
    "alpha = ts_res.params[0]\n",
    "beta = ts_res.params[1:]\n",
    "avgexcess_returns = mean(excess_returns, 0)\n",
    "# Cross-section regression\n",
    "cs_res = sm.OLS(avgexcess_returns.T, beta.T).fit()\n",
    "risk_premia = cs_res.params"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "The asymptotic variance requires computing the covariance of the demeaned returns and the weighted pricing errors. The problem is formulated using 2-step GMM where the moment conditions are \n",
    "\\begin{equation}\n",
    "g_{t}\\left(\\theta\\right)=\\left[\\begin{array}{c}\n",
    "\\epsilon_{1t}\\\\\n",
    "\\epsilon_{1t}f_{t}\\\\\n",
    "\\epsilon_{2t}\\\\\n",
    "\\epsilon_{2t}f_{t}\\\\\n",
    "\\vdots\\\\\n",
    "\\epsilon_{Nt}\\\\\n",
    "\\epsilon_{Nt}f_{t}\\\\\n",
    "\\beta u_{t}\n",
    "\\end{array}\\right]\n",
    "\\end{equation}\n",
    "\n",
    "where $\\epsilon_{it}=r_{it}^{e}-\\alpha_{i}-\\beta_{i}^{\\prime}f_{t}$, $\\beta_{i}$ is a $K$ by 1 vector of factor loadings, $f_{t}$ is a $K$ by 1 set of factors, $\\beta=\\left[\\beta_{1}\\,\\beta_{2}\\ldots\\beta_{N}\\right]$ is a $K$ by $N$ matrix of all factor loadings, $u_{t}=r_{t}^{e}-\\beta'\\lambda$ are the $N$ by 1 vector of pricing errors and $\\lambda$ is a $K$  by 1 vector of risk premia. \n",
    "The vector of parameters is then $\\theta= \\left[\\alpha_{1}\\:\\beta_{1}^{\\prime}\\:\\alpha_{2}\\:\\beta_{2}^{\\prime}\\:\\ldots\\:\\alpha_{N}\\,\\beta_{N}^{\\prime}\\:\\lambda'\\right]'$\n",
    " To make inference on this problem, the derivative of the moments with respect to the parameters, $\\partial g_{t}\\left(\\theta\\right)/\\partial\\theta^{\\prime}$ is needed. With some work, the estimator of this matrix can be seen to be \n",
    " \n",
    "\\begin{equation}\n",
    " G=E\\left[\\frac{\\partial g_{t}\\left(\\theta\\right)}{\\partial\\theta^{\\prime}}\\right]=\\left[\\begin{array}{cc}\n",
    "-I_{n}\\otimes\\Sigma_{X} & 0\\\\\n",
    "G_{21} & -\\beta\\beta^{\\prime}\n",
    "\\end{array}\\right].\n",
    "\\end{equation}\n",
    "\n",
    "where $X_{t}=\\left[1\\: f_{t}^{\\prime}\\right]'$  and $\\Sigma_{X}=E\\left[X_{t}X_{t}^{\\prime}\\right]$. $G_{21}$ is a matrix with the structure \n",
    "\n",
    "\\begin{equation}\n",
    "G_{21}=\\left[G_{21,1}\\, G_{21,2}\\,\\ldots G_{21,N}\\right]\n",
    "\\end{equation}\n",
    "\n",
    "where \n",
    "\n",
    "\\begin{equation}\n",
    "G_{21,i}=\\left[\\begin{array}{cc} \n",
    "0_{K,1} & \\textrm{diag}\\left(E\\left[u_{i}\\right]-\\beta_{i}\\odot\\lambda\\right)\\end{array}\\right]\\end{equation}\n",
    "\n",
    "and where $E\\left[u_{i}\\right]$ is the expected pricing error. In estimation, all expectations are replaced with their sample analogues."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Moment conditions\n",
    "X = sm.add_constant(factors)\n",
    "p = vstack((alpha, beta))\n",
    "epsilon = excess_returns - x @ p\n",
    "moments1 = kron(epsilon, ones((1, k + 1)))\n",
    "moments1 = multiply(moments1, kron(ones((1, n)), x))\n",
    "u = excess_returns - risk_premia[None, :] @ beta\n",
    "moments2 = u * beta.T\n",
    "# Score covariance\n",
    "S = mat(cov(hstack((moments1, moments2)).T))\n",
    "# Jacobian\n",
    "G = mat(zeros((n * k + n + k, n * k + n + k)))\n",
    "sigma_x = (x.T @ x) / t\n",
    "G[: n * k + n, : n * k + n] = kron(eye(n), sigma_x)\n",
    "G[n * k + n :, n * k + n :] = -beta @ beta.T\n",
    "for i in range(n):\n",
    "    temp = zeros((k, k + 1))\n",
    "    values = mean(u[:, i]) - multiply(beta[:, i], risk_premia)\n",
    "    temp[:, 1:] = diag(values)\n",
    "    G[n * k + n :, i * (k + 1) : (i + 1) * (k + 1)] = temp\n",
    "\n",
    "vcv = inv(G.T) * S * inv(G) / t"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The $J$-test examines whether the average pricing errors, $\\hat{\\alpha}$, are zero. The $J$ statistic has an asymptotic $\\chi_{N}^{2}$  distribution, and the model is badly rejected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vcv_alpha = vcv[0 : n * k + n : 4, 0 : n * k + n : 4]\n",
    "J = alpha @ inv(vcv_alpha) @ alpha.T\n",
    "J = J[0, 0]\n",
    "Jpval = 1 - chi2(25).cdf(J)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The final block using formatted output to present all of the results in a readable manner."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vcvrisk_premia = vcv[n * k + n :, n * k + n :]\n",
    "annualized_rp = 12 * risk_premia\n",
    "arp = list(squeeze(annualized_rp))\n",
    "arp_se = list(sqrt(12 * diag(vcvrisk_premia)))\n",
    "print(\"        Annualized Risk Premia\")\n",
    "print(\"           Market       SMB        HML\")\n",
    "print(\"--------------------------------------\")\n",
    "print(f\"Premia     {arp[0]:0.4f}    {arp[1]:0.4f}     {arp[2]:0.4f}\")\n",
    "print(f\"Std. Err.  {arp_se[0]:0.4f}    {arp_se[1]:0.4f}     {arp_se[2]:0.4f}\")\n",
    "print(\"\\n\\n\")\n",
    "\n",
    "print(f\"J-test:   {J:0.4f}\")\n",
    "print(f\"P-value:   {Jpval:0.4f}\")\n",
    "\n",
    "i = 0\n",
    "beta_se = []\n",
    "for j in range(5):\n",
    "    for m in range(5):\n",
    "        a = alpha[i]\n",
    "        b = beta[:, i]\n",
    "        variances = diag(\n",
    "            vcv[(k + 1) * i : (k + 1) * (i + 1), (k + 1) * i : (k + 1) * (i + 1)]\n",
    "        )\n",
    "        beta_se.append(sqrt(variances))\n",
    "        s = sqrt(variances)\n",
    "        c = hstack((a, b))\n",
    "        t = c / s\n",
    "        print(f\"Size: {j+1}, Value:{m+1}   Alpha   Beta(VWM)   Beta(SMB)   Beta(HML)\")\n",
    "        print(\n",
    "            f\"Coefficients: {a:>10,.4f}  {b[0]:>10,.4f}  {b[1]:>10,.4f}  {b[2]:>10,.4f}\"\n",
    "        )\n",
    "        print(\n",
    "            f\"Std Err.      {s[0]:>10,.4f}  {s[1]:>10,.4f}  {s[2]:>10,.4f}  {s[3]:>10,.4f}\"\n",
    "        )\n",
    "        print(\n",
    "            f\"T-stat        {t[0]:>10,.4f}  {t[1]:>10,.4f}  {t[2]:>10,.4f}  {t[3]:>10,.4f}\"\n",
    "        )\n",
    "        print(\"\")\n",
    "        i += 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The final block converts the standard errors of $\\beta$ to be an array and saves the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "beta_se = array(beta_se)\n",
    "savez_compressed(\n",
    "    \"fama-macbeth-results\",\n",
    "    alpha=alpha,\n",
    "    beta=beta,\n",
    "    beta_se=beta_se,\n",
    "    arp_se=arp_se,\n",
    "    arp=arp,\n",
    "    J=J,\n",
    "    Jpval=Jpval,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Save Results\n",
    "\n",
    "Save the estimated values for use in the $\\LaTeX$ notebook. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from numpy import savez\n",
    "\n",
    "savez(\n",
    "    \"fama-macBeth-results.npz\",\n",
    "    arp=arp,\n",
    "    beta=beta,\n",
    "    arp_se=arp_se,\n",
    "    beta_se=beta_se,\n",
    "    J=J,\n",
    "    Jpval=Jpval,\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
